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Abstract 

Asymptotically  efficient  (adaptive)  estimators  for  the  slope  parameters  of  the  linear 
regression  model  are  constructed  based  upon  the  "regression  quantile"  statistics  suggested  by 
Koenker  and  Bassett  (1978).  The  estimators  are  natural  analogues  of  the  adaptive  L- 
estimators  of  location  of  Sacks  (1974),  but  employ  kernel-density  type  estimators  of  the 
optimal  L-estimator  weight  function. 
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1.  Introduction 

The  existence  of  asymptotically  efficient  estimators  of  a  Euclidean  parameter,  (3,  in  the 
presence  of  an  infinite-dimensional  nuisance  parameter,  F ,  has  attracted  considerable  recent 
attention.  The  problem,  formulated  by  Stein  (1956)  of  asymptotically  estimating  0  as  well  as 
when  F  is  unknown  as  when  it  is  known,  has  been  treated  in  increasing  generality.  In  a 
remarkable  confluence  of  papers  Beran  (1974),  Sacks  (1974),  and  Stone  (1975)  independently 
proposed  adaptive  R-,  L-,  and  M-estimators,  respectively,  of  the  center  of  symmetry  of  an 
unknown  (symmetric)  distribution.  In  his  1980  Wald  lecture,  Bickel  (1982),  developing  the 
approach  of  Stein  (1956)  extended  adaptation  to  a  broad  array  of  problems.  In  particular,  he 
proposed  an  adaptive  M-estimator  for  the  parameters  of  the  linear  model 

yi'Xifio  +  ei  (1.1) 

with  [Xj-  =  (Xj !,...,  Xjp)}  a  sequence  of  known  p -vectors,  /90  6  Rp  an  unknown  regression 
parameter  to  be  estimated,  and  {//,}  a  sequence  of  independent  random  variables  with  com- 
mon distribution  function  F.  When  F  is  symmetric,  Bickel  constructed  an  adaptive  estimator 
of  the  entire  vector  f30.  Dropping  the  symmetry  condition,  he  further  showed  that  if  the  design 
contains  an  intercept,  that  is,  jc/  =  (1,  x/)  so, 

y,-  =  x/fio  +  e{  =  a  +  x/«y  +  <?,•  (1.2) 

then  the  (p-l)-vector  of  "slope"  parameters  can  be  adaptively  estimated.  Manski  (1984) 
reviewed  these  results,  and  offers  some  extensions  to  non-linear  regression  models.  Newey 
(1987)  has  recently  proposed  adaptive  method-of-moment  type  estimators  for  the  linear  model 
which  are  asymptotically  efficient  under  rather  weak  regularity  conditions.  Hogg  (1980)  has 
also  proposed  various  partially  adaptive  methods  based  on  M-estimators,  and  de  Jongh  and  de 
Wet  (1986)  have  recently  suggested  an  adaptive  choice  of  the  trimming  proportion  for 
trimmed  least  squares  estimators. 


In  this  paper  we  propose  fully  adaptive  L-estimators  for  the  slope  parameters  of  the 
linear  model,  under  the  least  restrictive  assumptions  possible  on  F  (needed  only  to  make  the 
asymptotic  efficiency  well  defined).  These  results  extend  results  of  Sacks  (1974)  to  the  case  of 
linear  regression  and  Koenker  and  Portnoy  (1987)  to  the  adaptive  case.  In  the  remainder  of 
this  section,  we  introduce  notation  and  state  our  main  results.  Section  2  gives  a  detailed  treat- 
ment of  our  construction  of  the  adaptive  estimator.  Section  3  treats  the  problem  of  construct- 
ing a  satisfactory  estimate  of  the  score  function.  Section  4  constructs  a  practical  version  of  an 
adaptive  L-estimator  and  describes  a  small  monte-carlo  experiment  designed  to  evaluate  the 
performance  of  the  adaptive  L-estimator  in  moderate-sized  samples.  We  conclude  that  a  prac- 
tical adaptive  L-estimator  can  be  constructed  for  the  slope  parameters  of  the  linear  model. 
The  estimator  achieves  high  finite-sample  efficiency  in  a  wide  variety  of  error  situations  and 
outperforms  standard  robust  methods  in  all  situations  we  investigated.  Substantial  gains  in 
efficiency  are  achieved  relative  to  simpler  robust  procedures  in  asymmetric  error  situations. 

Let  Xn  denote  the  (n  x  p)-matrix  with  iA  row  x,';  we  will  assume  throughout  that 
n~xX„Xn  — ►  Q,  a  positive  definite  matrix.  The  Euclidean  norm  of  x  will  be  denoted  ||.x||  and 
Xi(M)  will  denote  the  largest  eigenvalue  for  the  matrix  A/.  We  will  focus  attention  on  Bickel's 
(1982)  example  3:  the  linear  model  (1.2)  with  an  explicit  intercept  and  without  any  symmetry 
condition  on  F.  We  also  assume  that  the  means  have  been  subtracted  in  Xn  so  that  £  i,  =  0 
where  x,  is  the  last  (p  -  1)  coordinates  of  xt.  Thus,  if  Q  is  partitioned  so  that  Q  is  the  lower 
(p  -  1)  x  (p  -  1)  corner,  Q~l  is  the  corresponding  corner  of  Q~l.  The  following  regularity 
condition  on  the  sequence  of  designs  {Xn}  will  be  maintained. 

Condition  X:  There  exist  positive  constants  b, b,b,  and  c,  such  that 
(1.)    UQ-n-lXnXn)<bn-l* 

(2.)    £IWI3<^ 


(3.)    max,-  |*|  <  fin* 

(4.)     inf  # {/:  b  <  x-S  <b)>  en 

IM=i 

In  Portnoy  (1984)  it  is  shown  that  such  conditions  are  satisfied  for  a  broad  class  of  random 
designs,  as  well  as  for  ANOVA  designs  when  the  number  of  observations  per  cell  tends  to 
infinity.  On  F  we  require  only: 

Condition  F:  F  is  absolutely  continuous  with  finite,  non-zero  Fisher  information  1(F). 

Our  methods  are  based  on  the  regression  quantiles  of  Koenker  and  Bassett  (1978)  which 
solve  for/  e  [0,1] 

min  S  Ptiy*  -Xtfi).  (1.3) 

where  pt(u)  -  u\t  -  I{u  <  0)).  Let  {0n(t)  =  (an(t),in(t))}  denote  the  sequence  of  regression 
quantile  processes  so  defined.  In  the  Appendix,  a  uniform  Bahadur  representation  with  expli- 
cit remainder  is  established  for  Pn{t).  This  result  strengthens  somewhat  similar  results  of 
Jureikovd  and  Sen  (1984)  and  Koenker  and  Portnoy  (1987). 

Our  adaptive  estimator,  Tn,  of  7  is  a  linear  function  of  /?„(/),  that  is,  we  consider 

1 

Tn=^n(t)Jn(t)dt  (1.4) 

0 

A 

where  Jn(t )  is  an  estimate  of  the  optimal  score  function 

J0(t)  =  r!>'(F-Ht)) 

where  tp(x)  =  -L'(x)  and  L{x)  =  ln/(x).  Theorem  2.1  provides  conditions  on  /„(/)  which 
make  Tn  adaptive  for  any  F  satisfying  Condition  F.  A  kernel  estimator  Jn(t )  is  constructed  in 
Section  3  which  satisfies  the  condition  of  Theorem  2.1,  verifying  our  claim.  Some  further 
remarks  on  practical  aspects  of  estimating  J0(t )  are  contained  in  Section  4. 


Our  estimator  of  the  optimal  score  function  is  based  on  the  estimators  of  the  conditional 
quantile  and  conditional  distribution  functions  introduced  in  Bassett  and  Koenker  (1982). 
Denoting  the  set  of  solutions  to  (1.3)  by  Bn(t),  we  may  define  a  natural  estimator  of  the  f* 
conditional  quantile  of  Y  given  x,  as, 

Qn(t\x)  =  M{x'b\b  eBn(t)}.  (1.5) 

And  correspondingly, 

Fn(y\x)  =  sup  {t  €[0,1]  |  Q(t\x)<y),  (1.6) 

affords  a  natural  estimator  of  the  conditional  distribution  function.  At  the  mean  of  the  design, 
x  =  w-1!!*,-,  Q(u  \x)  is  a  proper  quantile  function  (a  non-decreasing,  left-continuous,  step 
function  on  u  e  [0,1]  (see  Bassett  and  Koenker  (1982)  Theorem  2.1),  so  Fn(y)  =  Fn(y  \x)  is  a 

A 

proper  (non-decreasing,  right-continuous  step-function  on  y  e  R)  distribution  function.  Fn 
behaves  asymptotically  exactly  like  a  sample  distribution  function  (see  Portnoy  (1984)).  The 
results  of  Section  3  give  methods  of  estimating  /0(O,  based  on  Fn(y)  which  satisfy  the  condi- 
tions for  adaptation  of  Tn  given  in  Section  2. 

2.   Vie  Adaptive  Estimators 

In  order  to  treat  asymptotics  for  L  -estimators  it  is  necessary  to  have  smooth,  positive 
densities.  Following  Stone  (1975)  this  may  be  accomplished  in  great  generality  by  convolving 
the  original  error  distribution  with  a  vanishingly  small  smooth  contaminant.  In  particular, 
define 

W{        Wi ' 
Ui  =  Ui  +  —  +  — — ,       Yi  =  Xi'fi  +  u{  (2.1) 

5  t 

where  {W{}  and  {IF, '}  are  independent  i.i.d.  sequences  (independent  of  z<,)  with  density 

^(vv)  =  V, V^7  -oo  <  vv  <  oo.  (2.2) 

(1  +p(vv))2 

Here  p(w)  is  an  even  continuously  three  times  differentiable,  positive  function,  increasing  on 


[0,  1],  with  p(w)  =  \w  \  for  \w  |  >1.   Let  G(w)  denote  the  c.d.f.  corresponding  to  g,  and  define 
(for  F  e  F) 

ft(x)«t  J  g(t(x  -y))dF{y) 
f.(x)-s  fg(s(x  -y))dFt(y) 
Ft(x)  =  {G(t(x  -y))dF(y) 
Fa(x)  =  {  G(s(x  -y))dFt(y). 

That  is,  /,  and  F,  are  the  density  and  c.d.f.  for  z7,-;  and  ft  and  Ft  are  the  density  and  c.d.f.  for 
iii  +  Wilt.  Lastly  define  for  fixed  r?  <  lk  and  arbitrary  b  >  0, 

5n=(log/7)",  /n=(log/z)6  (2.4) 

Note  that  the  subscript  n  on  sn  and  tn  will  often  be  suppressed. 

Furthermore,  since  the  uniform  Bahadur  representation  (Theorem  A.l)  holds  only  on  a 
compact  subinterval  of  [0,  1],  the  interval  of  integration  must  also  be  restricted  to  a  subinter- 
val.  Thus,  for  fixed  <5>0,  0<€<(5  +  r/<%,  and  a  <  lk,  define  (for  F  e  F) 

an  =  (log  «r  +  Ft(-%  (log  n)6)  +  1  -  F<(+%  (log  n)*) 
&n=n^  +  (log  n  )"*  +  F*n{-lk  (log  n  f)  +  1  -  Fn'(+%  (log  n  )6), 

where  Fn  is  the  Koenker-Bassett  c.d.f.  estimator  (see  (1.6))  based  on  observations  Y{  +  \V{  /tn. 

A 

Also     let     Fn     denote     the     Koenker-Bassett     c.d.f.     estimator     based     on     observations 

—  A 

Y{  =  y,  +  WJsn  +  W{  /tn  (that  is,  Fn  estimates  F,).  Now  define  the  adaptive  (slope  parameter) 
estimator: 

/    lnU)Jn(t)dt 

Tn  =  ^~ (2.6) 

/    J«(Odt 

where    Jn(t)    is    any    appropriately    consistent    estimator    of   the    score    function    J,(t)  = 
L~  (F~l  (/ )).   An  appropriate  example  (satisfying  (2.7))  generated  by  kernel  estimation  based 


on  Fn  is  given  in  section  3. 

Theorem  2.1  Let  Jn{t )  be  an  estimator  of  JB{t)  satisfying 


I-*- 


/     \Jn(t)-J.(t)\dt  =op((logn)^2^))  (2.7) 

Tlien  for  any  F  e  F 

^"(^-7)-Np_1(0,C"1/I(/r)) 

where  I  w  //ze  Fisher  information  for  F . 

This  theorem  will  be  proved  after  some  preliminary  properties  of  /,  are  developed.  The 
following  Lemmas  each  assumes  the  hypotheses  of  Theorem  2.1  and  that  F  e  F. 

Lemma  2.1   Given  an  defined  by  (2.5),  define 

xn  =  max  {-F-l(an),  F~\l  -  om)>.  (2.8) 

Tlien  there  is  a  constant  c  *  such  that  for  Bn  <  (log  n  )6, 

inf(/.(x):  -xn  -  Bn  <  x  <  xn  +  Bn)  >  c*(log  n)^^. 
Proof.  Note  that  (by  (2.2)), 


F.(-xJ  =  /  G(s(xn  -y))  dFt(y)  <  C(-%  xn)  +  P{\u  +  W ft  \  >  %xn) 


+  Ft(-Kxn)+  l-FtVkxm) 


(2.9) 


(1  +  %5Xj 


and  a  similar  inequality  holds  for  1  -  F,(xn).  Hence,  from  (2.8)  an  =  Fe{-xn)  or 
an  =  \  -  F,(xn);  and,  if  xn  were  larger  than  (log  n)6,  (2.9)  would  be  contradicted  by  (2.5)  (for 
n  large  enough).  Thus,  it  follows  that  (for  n  large  enough), 

0  <  xn  <  (log  n)6      and      xn  -*  +oo  (2.10) 

(since  an  -»  0  by  (2.5))/  Now  (for  x  >  0) 

/.(*)  =  / r^Ft(j;)>  - -P{m  +  JT//  <  0}, 

J    (\+p(s(x-y))r       tKy)       (1  +p(sx))2 


and,  hence,  for  \x\  <  2(log  nf,  with  c '  =  P{u  +  W/t  <  0}, 


/.(*)> 


(1  +p(2s  (log  nf))2       (1  +25(log«)d")2 

and  the  result  follows  from  (2.4).    ■ 

Lemma  2.2.  For  constants  cv  (y  =  0,  1,  2,  3)  with  c0  =  c  in  (2.2), 

\/}%x)\  <  c^s^1       and        \ft{%x)\  <  cus^1 

uniformly  in  x . 

Proof.    Differentiate  fa(x)  or  ft(x)  (see  (2.3))  under  the  integral  and  use  the  fact  that 
derivatives  of/?  are  uniformly  bounded.  ■ 

Lemma  2.3.  f  „  (xn)  — »  0  and  /„'  (xn)  — ►  0  as  n  — ►  oo. 
Proof.  As  in  (2.9)  (using  (2.4)  and  (2.10)), 

l/-C*.)l  ^  7i — ? — *  +  ?  (*"t(-%*J  +  *  -  Ft(%*J}  -  0 

(1  +1ksxn)2 

!/•'<*•>!  ^    ,,    V2     ,3  +  c*{ft(-^xn)  +  1  -  Ft(Kxn))  -  0.       ■ 

|1    +  %SXn\3 

Lemma  2.4.      J     Jt(t)  dt  — »  1(F)  as  /z  — »  oo. 

Proof.    A  slight  modification  of  the  proof  of  Theorem  4.1  in  Stone  (1975)  provides  the 
result  here.  ■ 

Lemma  2.5.  As  n  -»  oo, 

/     |7„(/)M/  =Op(5„2)  =  Op((log«)2") 

Proof.  By  condition  (2.7),  we  need  only  consider 


1  -a. 


J     \J.(t)\dt  <   J    \L,"(x)\f.(x)dx 

(2.11) 

xn  xn 

<   /    \/;\x)\dx  +   J   (L.Xx))2f.{x)dx 

Now  differentiating  /„  in  (2.3)  twice  (under  the  integral)  and  using  the  fact  that  derivatives  of 
p  are  bounded, 

(1  +  p(s(x  -y)))3  I  +  p(s(x  -y))y 

*  CiS2  I  n 7T ^  dFt(y)  =  C^f^x)- 

(1  +  p(s(x  -y)))2 

Hence,  the  first  term  in  (2.11)  is  0(s2).  The  last  term  in  (2.11)  converges  to  1(F)  by  Lemma 
2.4;  and,  thus,  the  desired  result  follows.  ■ 

Lemma  2.6  Let  an  and  an  be  given  by  (2.5)  and  assume  F  e  F.  Tiien,  with  probability 
(ending  to  one  (with  a  <lk  as  defined  in  (2.5)), 

ocn  <  an  <  an  +  2n^. 

Proof.  By  proposition  3.1,  \F„(x)  -  Ft(x)\  =  Op(n~ll2)  uniformly  for 
F,_1(a»)  <  x  <  f-^l  -an).  By  equations  (2.10)  and  (2.8),  ±  %(log«)fi  lies  in  this  interval, 
and  the  result  follows  immediately.    ■ 

Lemma  2.7.  As  n  -  oo,  /  |7.(/)|  dt  =  0{n^'2),  where  Sn  =  (an,  &n)  \J  (1  -  &n,  1  -  <*n). 

Proof.  Following  the  argument  of  Lemma  2.5  and  using  Lemma  2.6,  with  probability 
tending  to  one 

J  |/.(/)|  <//  <  CxJVrt*.  +  2/i-)  -  F.-*(a.))  +  /  (I/(x))2  /.(*)  flfx 

From  Lemma  2.1  (and  the  mean  value  theorem)  the  first  term  is  of  order  O(log  n)b  n~°  = 
0(n-«/2).  A  similar  argument  shows  that  the  second  term  has  this  same  order.  The  same  argu- 
ment also  applies  to  the  integral  from  (1  -  an)  to  (1  -  an).  ■ 


Proof  of  Theorem  2. 1 .   From  (2.6) 


l-a. 


^(Tn-1)  =  ^ — 

/    Jn(t)dt 


B„ 


(2.12) 


By  Lemmas  2.4  and  2.7  and  condition  (2.7),  the  denominator,  Bn,  tends  to  1(F)  in  probability; 
so  it  remains  to  consider  the  numerator.  Define 

Un  =  -7=r  Q'1  £  XiKin(t ),     Kin{t )  =  t  -  I (%  <  F~\t ))  (2.1 3) 

v"  .=1 


Then,  by  Theorem  A.l, 

V#T  |7„(0  -  1  -  -)=Un{t){f,{F-\t)))-'\  <  (h"1/4  (log  n)B(X,  F.)  +  h^M*))//.^/)) 

on  (an,  1  -  an)  except  with  probability  bounded  by  q(X,  F)  (see  Lemma  (A. 3)).   By  Lemmas 
2.1  and  2.2,  uniformly  on  {an,  1  -  an), 


B(X,Ft) 
f,{Fr\t)) 


=  0((\ogn)r>+3l26  +  rt),      q(X,F)  =  Q 


1 


v^ 


ec  64(log«)2(6+,,) 


0. 


Therefore  (using  Theorem  A.l),  with  probability  tending  to  one,  the  numerator  in  (2.12)  satis- 
fies (since  an  <  an  in  probability) 


10 


1  -ot„ 


\An-      J     Un{t)J,{t)/f,(Fa-\t))dt\ 

/       W«(t)\dt 


1  -a„ 

I  7   (t\\    At 

1  -«« 


sup,  |  £/_(/)  | 

^OpOr^dogw))     ."       -r-7-z     +    .  ,    ,,,        /    |J.(/W.(OI<ft 
mfan/g(x)  infaji/,(0       aJn 

(2.14) 

supt  I  £/»(*)  I    f       .... 
+  ^~7 — 7777"  U    ly«(/)l  ^ 

where     Sn      =     (an,  aj  \J  (1  -  a„,  1  -  an).       By     condition     (2.7)     and     Lemma     2.5, 

/     \Jn{t)\  dt  =  Op(s£);  and,  hence,  by  (2.4)  and  Lemma  2.1,  the  first  term  in  (2.14)  tends  to 

zero  in  probability.  Using  an  invariance  principle  for  Un(t)  (e.g.,  see  Koul  (1969),  Theorem 
A. 3),  sup  t  \Un(t)\  =  Op(l).  Thus,  combining  Lemma  2.1  and  condition  (2.7),  the  second  term 
in  (2.14)  also  tends  to  zero  in  probability.  Lastly,  the  third  term  converges  to  zero  by  Lemmas 
2.1  and  2.7.  Therefore,  the  right  side  of  (2.14)  tends  to  zero  in  probability;  and  it  remains  to 
consider 

vn=  J  un{t)j,{t)/f.{Fr\t))dt. 

Fix  t  e  Rp  and  consider  /  'Vn.  define  ain  =  /  'Q~lXi/yfn  .  Then 

E  a£-*t'Q-*t    as    n  -  oo.  (2.15) 

«=i 

and  t  'Vn  is  a  weighed  sum  of  n  i.i.d.  random  variables  (see  (2.13)): 

«         l-» 
t'Vn  =  E  ain    J    Ktn(t)J,(t)/  /.(F.-HO)  dt 

To  apply  the  Liapounov  Central  Limit  Theorem,  compute  third  moments:  since  \Kin(t)\  <  2 


11 


£|/'KJ3<8£  \atn\- 


!-<*« 


J     \J,{t)\/f,{F,-\t))dt\   <£  \ain\3O((\0gn)2^) 


where  Lemma  2.1  and  Lemma  2.5  are  applied.   Lastly,  from  condition  X3,  the  definition  of 
ain,  and  (2.15), 

E  \t '  Vn  | 3  <  Y,  atlO(n  "^(log  nf )  —  0    as    n  -+  oo. 

So  by  Liapounov's  theorem  (e.g.,  see  Breiman  (1968),  p.  275),  it  remains  to  check  that  the  vari- 
ance converges  to  /  'Q~lt  •  1(F).  Direct  calculation  gives 


l-a_  !-<*„ 


fQ-1!  L     L    f.(F.-L('))/.(F.-i(f)) 


=  /  /  {min  {F.(x)tF.(y))-F.(x)F.(y))L~(x)L~(y)dx  dy 

(where  xn  =  F„_1(Q!n),  yn  =  F,-1(l  -  an)).  Let  ag  denote  the  above  double  integral  with 
xn  =  -oo  and  yn  =  oo.  Then  a,  -  an  can  be  expressed  as  the  sum  of  integrals  over  rectangles 
disjoint  from  (xn,  yn)  x  (xn,  yn).  Consider  one  such  integral:  the  integral  over 
(-oo,  xn)  x  (xn,  oo).   Integrating  by  parts, 

xn    oo 

I  /   /  F,{y)  (1  -  F.(x))  L~(x)  L~(y)  dx  dy  | 

-°°xn 

=  |(-(1  -  F,{xn))Lt\xn)  -  fg(xn)){F,(xn)L,\xn)  -  f.{xn))\ 


<  (L;(xn)?F,(xn)  +  \L.Xxn)\f,(xn)+f?(xn). 


By    Lemma    2.3,    /,2(x J  -  0    and    |L/(xn)|/.(xJ  =  |//(xj|  -+ 0    as    n  -  oo.     Also    by 
L'Hospital's  rule, 
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lim  F,(.xn)(L;(xn))2  =  lim  F.(xJLa\xJ  lim  J 


l\m  Ft(xn)L,\xn)  •  lim 


/.(*«) 
/.(*.) 


=  iim/;uj  =  o, 

by  Lemma  2.3.   Treating  other  contributions  to  \an  -  a„  |  similarly,  we  see  that  \<rn  -  <rt  |  -+0 

oo 

as  n  -+  oo.    But  integrating  by  parts,  a,  =  j  L~(x)  f  t(x)  dx  — ►  1(F)  as  n  — ►  oo  (by  Lemma 

-co 

D 

2.4).  Therefore,  cr„  — »  1(F);  and,  hence,  Vn  — ►  Np_t  (0,  1(F)  Q~l).  As  noted  above,  this  implies 

D 

An  (in  (2.12))  has  the  same  limiting  distribution.   Therefore  An/Bn  -+  Np_!  (0,  Q~l/\{F)),  and 
the  proof  is  complete,  a. 

3.  ,4>z  Appropriate  Estimator  of  the  Score  Function 

Here,  as  in  section  2;  we  assume  that  the  errors  are  distributed  according  to  F„  defined 
in  (2.3)  with  F  e  F.  For  such  smooth  F„;  it  is  relatively  easy  to  construct  an  estimator,  Jn(t) 
of  /,(/)  satisfying  (2.7)  by  using  appropriate  density  estimators  based  on  F„.  Since  (2.7) 
requires  only  logarithmic  convergence,  the  following  conditions  on  the  density  estimators  will 
be  seen  to  be  sufficient.  Let  sn  =  (log  n)n  (as  in  (2.4)), 

Un  =  {x:  F.-*(a.)  -  B  <  x  <  F.-\\  -  a.)  +  5}  (3.1) 

for     any     constant     B,     and     define     ATn     =     (log  n  )-<2**>)     (so     that,     by     Lemma    2.1, 
{inf(/n  /,(x)}-1  =  0(l/Kn)).    Suppose  there  are  density  estimators,  fn(x)  (with  derivatives 

/»(■*)),  and  (smooth)  c.d.f.  estimators,  Fn(x)  (generally  the  integral  of  /„)  such  that  for  v  =  0, 
1,2,3, 

supir.  \Uv\k)  -  /y](x)\  =  op«sJKn)-*)  =  op((log  «)-">(*"))  (3.2) 

supn    \F,-\F.(x))  -x\=  oMsJKJ-4)  =  op((log  «)-*(**»>)  (3.3) 
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(where    /,    and    F,    are    given    by    (2.3)).     As    in    section    2,    define    L„(x)  =  log  f„(x), 
4(* )  =  log  fn(x ),  /,(/)  =  L~(F,-\t )),  and  Jn(t )  =  Ln'(F-\t )). 
Lemma  3.1.  If  (3.2)  holds,  then 

sup^  |CU)-  VU)I  =op((logn)-<2^)). 

P/oo/.   First  note  that  by  (3.2)  and  Lemma  2.1, 

infVn/n(x)>cX-op(l).  (3.4) 

Hence,  {infij  /n(x)}_1  =  Op(l /£"„).  Similarly,  by  (3.2)  and  Lemma  2.2,  we  also  have 

\fiv\x)\  <cj»*      for      xeUn.  (3.5) 


Therefore,  letting  A/  denote  absolute  differences  between  /„  and  /,  (and  their  derivatives), 
and  (with  n  suppressed)  writing  L  "{x )  =  /  "(x )//  (x )  -  (/  '{x )//  (x  ))2, 


sup.    \Ln\x)-L,\x)\  <0 


A/  :        sup  /  -A/        sup(/  -  +  /  ')A/  - 
K  K2  K2 


sup  /  '2  sup(/  +  /  )A/ 
AT4 


=  Or 


k: 


K. 


=  op(Kn). 


Theorem  3.1.  If  (3.2)  and  (3.3)  /zoW,  tftew 


l-a. 


/     \Jn«)-J.(t)\dt  =op(log/r-<2^)). 


/V00/.  Changing  variables  using  t  =  /VK*)  and  letting  xn  =  F,-1(an),  yn  =  F~\l  -  an), 
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yn 


J     \Jn(0-JAt)\  dt  =  /  \L;(F-\F,(x)))-L;(x)\  f.(x)dx 


<  /  iCU)  -  L~{x)\  f.{x)dx  +  /  |       /        L  '"  (u)du  |  f.(x)dx. 

xn  xn  x 

The  first  term  has  the  desired  order  by  Lemma  3.1.  By  Lemmas  2.1  and  2.2  and  (3.4)  and 
(3.5), 

sup  \L'"(x)\  =Op(5„4/K„3)  (36) 

Hence,  by  condition  (3.3)  and  (3.6),  the  inner  integral  in  the  second  term  above  is 
Op  (**/*,?)  •  o.dsJKJ-4)  =  op(Kn);  and  the  result  follows.  ■ 

A 

Lastly,  estimates  /„  satisfying  (3.2)  and  (3.3)  need  to  be  constructed.   In  fact,  it  is  gen- 
erally easy  to  construct  estimates  where  the  error  terms  are  even  smaller  than  those  required 

A 

in  conditions  (3.2)  and  (3.3).   For  example,  if  there  is  a  c.d.f.  estimator,  Fn(x),  satisfying 

sup  \Fn(x)  -  F,(x)\  =  Op(n-°)      for  some      d  >  0  n  7\ 

then  kernel  estimators  satisfying  (3.2)  and  (3.3)  can  be  constructed  (and  similarly  for  estimat- 
ing Ft).  We  first  show  that  (3.7)  holds  for  a  =  %  for  the  Koenker-Bassett  c.d.f.  estimator,  Fn, 
given  by  (1.6)  based  on  observations  f,.   However,  it  is  no  harder  to  show  that  the  empirical 

A  A 

distribution  of  residuals  from  any  estimator,  /?  (with  /9  consistent  at  rate  >i^°)  will  also  satisfy 
(3.7). 

Proposition  3.1:  Assume  that  the  result  of  Theorem  A.J  holds.  Then  condition  (3.7)  holds  for 
F  e  F  with  a  =  xk. 

Proof.  By  Theorem  A.l  and  Lemma  2.1  and  2.2, 

sup  \Fn(x)-F,(x)\  <sup  |-£  r(Ui<x)-F.(x)\  +  Op(n^  (log  n)b) 

»  Un        n    »=i 

for  some  b  >  0.  By  Kolmogorov's  result  (e.g.,  see  Breiman  (1968),  p.  287)  the  sup  on  the  right 
is  Op(/7"1/2);  and,  hence,  (3.7)  holds.  The  same  argument  works  for  \F*n(x)  -  Ft(x)\  where  Fn* 
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is  based  on  Yt  +  WJt.  ■ 

Now,  let  k(x)  be  a  kernel  which  is  a  (symmetric)  density  with  support  in  [-1,  1]  such 
that  \k("\x)\  <  b  (for  some  b  >  0)  uniformly  for  all  x  and  v  =  0,  1,  2,  3,  4.  Given  Fn(x) 
satisfying  (3.7)  define 


/»(*)  =  /•„  /  k(rn(x  -y)dFn{y) 

— OO 
X 

Fn(x)=  /  fn{x)dx, 


(3.8) 


where 

rn=na°      with      a0<a/4.  (3.9) 

Lemma  3.2.  If  (3.7)  holds,  then  (3.2)  holds  for  estimates  given  by  (3.8). 
Proof.  Integrating  by  parts,  for  u  =  0,  1,  2,  3, 

oo 
-oo 

Therefore, 

oo 

l/^(x)-/.M(*)l  <r*2  J  kW(r(x  -  y))\Fn(y)  -  F,(y)\  dy 

-oo 
oo 

+  1/  r^kW(r(x  -y))F.(y)dy-f,M(x)\. 


(3.10) 


By  (3.7)  and  the  conditions  on  k,  the  supremum  of  the  first  term  above  is  of  order 
r1^1  •  Optrt-*),  which  decreases  as  a  power  of  n  by  (3.9).  For  the  second  term,  integrating  by 
parts  yields 
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CO  oo 

/  r^2k^Hr(x  -y))F,(y)dy  =  /  rk(r(x  -y))f^(y)dy 

-co  -co 

CO 

=  /  fc(«)/,M(jc  --)du 
4o  r 

CO 

-/.M<*)--  /  uk{u)fW{X{u))du 

r  4o 

Thus,  by  Lemma  2.2  and  the  conditions  on  k, 

CO 

supUn  1/  r"«*H*)(r(*  -j;))Ft(y)^  -/.M(y)|  =0 


5w-2 


and,  hence,  the  supremum  of  the  second  term  in  (3.10)  also  decreases  as  a  power  of  n.  Thus, 
(3.2)  holds,  in  fact,  with  an  error  of  order  n-"*  with  a  *  <  min  (a0,  a  -  Aa0)  (where  a0  defined 
in  (3.9)).  ■ 

Lemma  3.3.   //  (3.7)  holds  with  B  replaced  by  3B  in  the  definition  of  Un  (3.1),  then  (3.3) 
holds  for  estimates  given  by  (3.8). 

Proof.  Let  Un{B)  denote  the  set  Un  in  (3.1)  with  dependence  on  B  explicit,  and  define 
Dn  =supuj3B)  \Fn(x)-F,(x)\  =Op(n^). 
Let  e  >  0  be  given  and  choose  n  large  enough  so  that  by  Lemma  2.1,  (3.7),  and  (3.9), 

±  +  DJMUnm  /.(*)<  cn^1  <  B 

for  some  ax  <  a0  and  constant  c,  with  probability  at  least  1  -  e.  Then  since  the  support  of  k  is 
contained  in  [-1,  1],  (3.8)  implies  that  for  y  e  Un(2B),  with  probability  at  least  1  -  e  (for  n 
large  enough), 

Fn(y)<Fn(y  +  -)<F,(y  +  -)  +  Dn 
r  r 

<  F,{y  +  —  +  Dn/infyn(3B)  /,(*)). 
Now  let  x  =y  +  j  +  Dn/MUn{3B)  /,(*).  Then  for  x  e  Un(B), 
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x  -  j  -  Ai/inf^gs)  /.(x)  <  P,lF((x)) 


or 

with  probability  at  least  1  -  e  for  n  large  enough.    The  reverse  inequality  follows  similarly; 
and,  hence,  the  result  holds,  a 

4.  Practical  Experience 

To  assess  the  performance  of  adaptive  L  -estimation  in  practical  applications,  a  small 
scale  monte-carlo  experiment  was  conducted.  Before  describing  the  experiment  in  detail,  we 
should  explicitly  describe  the  version  of  the  adaptive  estiamtor  (1.4)  as  it  is  employed  in  the 
experiment. 

A,  A  

In  Section  3,  it  is  shown  that  the  estimator  Fn(y)~  Fn(y\x)  defined  in  (1.6)  and 
described  in  detail  in  Bassett  and  Koenker  (1982)  and  Portnoy  (1984)  satisfies  the  condition 

sup    \Fn(y)-F,(y)\  =Op(n-k)  {4A) 

and  Fg{y)  defined  in  Section  2,  for  Un  given  in  (3.1),  and  further,  that  kernel  density  estima- 

A 

tors  of/,  and  its  derivatives  based  on  Fn(y)  can  be  used  to  achieve  the  sufficient  condition 
(2.7)  for  an  adaptive  Jn(t)  required  by  the  estimator  defined  in  (1.4). 

Rather  than  randomly  perturbing  the  observed  v's  as  suggested  by  the  theory  of  Sections 
2  and  3,  we  have  chosen  instead  to  smooth  tn(y)  directly  by  kernel  methods.  For  appropriate 
choice  of  the  kernel,  this  may  be  viewed  as  taking  expectations  with  respect  to  the  random- 
ized estimator  treated  in  Section  2,  cf.  Stone  (1975).  Fn{y)  takes  the  form, 

h(y)  =  £,Pi  Ky  <£.-)  (4.2) 

m-l 

for  numbers  0  <  px  <  px  +  p2  <  J]  p,  <  1  and  &  <  £2  <  '  • '  <  £m-    So,  we  may  write  kernel 

i=i 
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estimates  of  the  density  and  its  derivatives  as 


(4.3) 


i=i 


where  &(•)  denotes  a  proper  kernel  and  r,"1:  /  =  1,  ■  •  • ,  m  are  local  bandwidth  numbers  which 
control  the  degree  of  smoothness  of  the  estimate.  The  latter  are  chosen  by  the  procedure  out- 
lined in  Silverman  (1986,  pp.  101-2).  A  pilot  estimate,  f  (x),  of  the  density  is  constructed 
based  on  a  fixed  bandwidth,  say  h.  Then  the  local  bandwidth  factors, 

a,  =[/(£,•  )/#r 

are  computed  with  log  g  =  J]  p,  log/(f?,).  The  sensitivity  parameter,  a,  controls  the  respon- 
siveness of  the  local  bandwidths 


rin  =  (*A<)-» 

to  the  pilot  density.  We  have  adopted  the  (standard)  choice  a  =  lk  after  some  brief  experimen- 
tation with  other  values. 

The  choice  of  the  kernel  &(•)  is  critical  to  the  success  of  the  method.    Guided  by  the 
theory  of  Section  2  we  have  chosen  the  Cauchy  kernel, 

k(x)  =  («(l  +x2))-\ 

which  has  the  salient  characteristic  that  it  tends  to  control  the  tail  behavior  of  our  estimated 
/(•)  much  more  successfully  than  more  conventional,  thinner-tailed  kernels. 


Given  the  estimates  (4.3),  it  is  natural  to  define 


JJM  = 


Ac?.) 


% 


% 


,/  =  1,  2,  •  •• ,  m 


where  /,■  =  J]  p}  is  the  cumulative  mass  associated  with  the  quantile  £,-.   In  theory  and  prac- 

tice  it  is  essential  to  trim  the  tails  of  the  weight  function  so  for  a  sequence  an  -*  0,  we  com- 
pute, 
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/.(*#) -ft7.ft)/E  VjJJti)  (4-4) 

with  pi  =  max  (min  (fi%  1  -  aj  -  max  (/,_x,  aj,  0). 

It  remains  only  to  describe  the  choice  of  (i)  the  initial  window  width,  ft;  and  (ii)  the 
trimming  proportion  a.  The  latter  is  straightforward;  we  simply  report  results  for  both  of  the 
traditional  trimming  proportions  a  =  0.05  and  a  =  0.1.  The  theory  of  Section  3  suggests  that 
an  — *  0  as  a  negative  power  of  log  n;  thus  these  traditional  values  should  be  reasonable  for  a 
wide  range  of  sample  sizes  (say  n  <  1000).  The  choice  of  h  is  a  delicate  issue  and  warrants 
considerable  further  investigation.  We  began  with  a  conventional  rule  for  density  estimation, 
see  Silverman  (1986,  Section  3.4), 

h  =  k  min  (sx,  s2)/nxlb 

A 

where  sx  and  s2  are  alternative  estimates  of  the  dispersion  of  Fn(y ):  standard  deviation,  and 
(interquartile  range)/ 1.34,  respectively,  and  k  is  a  constant  to  be  determined.  The  choice 
k  =  .9  tuned  to  minimizing  integrated  mean-squared  error  of  the  normal  density  is  clearly 

A 

inappropriate  in  the  present  instance.   Virtually  imperceptible  bulges  in  /  give  rise  to  violent 

A 

oscillations  in  /.  We  have  adopted  «  =  2.5  provisionally,  although  this  tends  to  oversmooth  to 
a  significant  degree  in  some  cases.  In  Figures  4.1  and  4.2  we  illustrate  several  estimated  J 
functions  for  the  Gaussian  and  Cauchy  cases  respectively  for  a  bivariate  linear  model  with 
100  observations.  The  smooth  curves  in  each  case  depict  the  "true"  J. 

We  should  emphasize  at  this  point  that  many  of  the  choices  described  above  may  be 
easily  criticized.  Indeed  the  choice  of  kernel  estimation  of  /  is  itself  questionable.  Cox 
(1986)  has  proposed  an  elegant  smoothing  spline  approach  to  the  estimation  of/  '(x)/f(x) 
which  may  prove  attractive  in  the  present  instance  as  well,  if  a  satisfactory  approach  can  be 
found  for  controlling  the  tail  behavior  of  the  estimator.  In  some  preliminary  experiments  we 
found  this  to  be  difficult.  Clearly,  many  alternatives  exist  to  the  particular  choice  of  initial 
and  local  bandwidths  described  above.   We  regard  the  current  methods  as  simply  illustrative 


20 


Figure  4.1 
Three  7's  with  Gaussian  Errors 
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Figure  4.2 
Three  7's  with  Cauchy  Errors 
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of  one  possible  approach  which  yields  quite  promising  results. 


0.  8 


1.0 


21 

The  experiment  is  (provisionally)  limited  to  the  bivariate  linear  model, 

>',-  =  a.  +  fiXi  +  Hi 

with  the  xt  drawn  as  i.i.d.  Gaussian  and  u{  also  i.i.d.  from  one  of  the  distributions  appearing  in 
Table  4.1.  Since  asymmetric  distributions  are  of  substantial  interest  we  restrict  attention  to 
the  relative  performance  of  several  estimators  of  the  slope  parameter,  /?.  To  control  computing 
costs  we  restrict  attention  to  only  a  few  competing  estimators. 

Once  the  regression  quantile  process,  P(t),  implicitly  defined  in  (1.3)  has  been  computed, 
it  is  easy  to  compute  a  variety  of  L-estimators.  For  example,  the  analogues  of  the  trimmed 
means 

l-a 

3a  =  (l-2a)"1   /  kt)dt, 

a 

termed  "trimmed  regression  quantiles"  are  readily  calculated  as  in  (4.4)  setting  Jn(t )  =  1  on 
(a,  1  -  a).  These  estimators  are,  asymptotically,  closely  related  to  the  Huber  M-estimators. 
We  consider  three  members  of  this  family:  TRQ(.5),  the  /^estimator;  TRQ(.25),  a  regression 
midmean;  and  TRQ(.l),  the  10%  trimmed  regression  quantile.   In  addition,  for  each  case,  we 


l. 

2. 
3. 


Name 


Gaussian 
Cauchy 
Uniform 
Laplace 

Exponential 
Lognormal 

Bimodal 


Table  4.1 


Distributions,  Densities,  and  their  Optimal  7's 


Density1 


Wi+x2))"1 

Vi|(*) 

x  >  0 


MM* 

Vie- 


x_V0ogx) 
.5<t>(x  -  3)  +  .5<f>(x  +  3) 


Optimal  J2'3 


1 

2(i-<22oo)/(i  +  e2("))2 

.560(u)  +  .SS.iu) 

60(u) 
-log  (Q(u))/Q2(u) 
+  9    1  ,    MQ  ±  3)  -  4>{Q.  -  3))2 
(HQ  +  3)  +  4>{Q  -  3))2 


4>{x)  =  (»-£  e-*2l2 

6x(u)  denotes  the  Dirac  density  with  point  mass  1  at  x. 

Q  =  Q{u)  denotes  the  quantile  function  corresponding  to  the  density  given  in  column  2. 
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compute  the  optimal  L-estimator  using  the  J  function  appearing  in  Table  4.1.  Finally,  we 
compute  the  ordinary  least  squares  estimator  (OLS)  and  the  least  maximum  deviation  (LMD), 
or  /qq  estimator.  The  latter  is  the  maximum  likelihood  estimator  for  the  uniform  case,  the  sam- 
ple midrange  in  the  location  model,  and  is  readily  computed  by  linear  programming  methods. 

In  Table  4.2  we  report  monte-carlo  relative  efficiencies  for  each  of  these  estimators 
based  on  1000  trials.  The  reported  efficiencies  are  relative  to  the  optimal  L-estimator  in  each 
case  defined  by  the  J  function  given  in  Table  4.1.  The  random  number  generator  was  the 
portable  version  of  the  Marsaglia  "superduper"  generator  as  implemented  in  S  (Becker  and 
Chambers  (1985)),  so  results  should  be  reproducible  (up  to  differences  in  machine  precision) 
across  machines  given  the  seeds  used  here. 

A  number  of  anomalies  in  Table  4.2  should  be  addressed  immediately.  Several  estima- 
tors have  efficiencies  greater  than  one  implying  that  they  performed  better  than  the  asymptoti- 
cally optimal  L-estimator.  This  is  the  case  for  the  ordinary  least  squares  (/2)  estimator  in  the 
Gaussian  case.  Here  the  optimal  L-estimator  is  an  untrimmed  mean  of  the  regression  quan- 
tiles  and  suffers  a  slight  deficiency  (2%)  relative  to  the  classical  least  squares  estimator. 
Perhaps  more  surprisingly  the  regression  midmean  (TRQ.25)  outperforms  the  /i-estimator  for 


Table  4.2 

Finite  Sample  Efficiencies  of  Various 
Estimators  of  the  Bivariate  Linear  Model1 


Estimator 

Distribution 

ARQ.05 

ARQ.10 

TRQ.10      TRQ.25 

TRQ.5 

/2 

/oo 

Normal 

.96 

.95 

.96                .87 

.65 

1.02 

.11 

Cauchy 

.89 

.87 

.47                .85 

.84 

.00 

.00 

Laplace 

1.01 

1.02 

.90              1.04 

1.00 

.68 

.03 

Uniform 

.32 

.28 

.26                .18 

.12 

.33 

1.64 

Exponential 

.16 

.15 

.08                .08 

.06 

.06 

.04 

Lognormal 

.27 

.23 

.09                .11 

.09 

.03 

.02 

Bimodal 

2.30 

2.09 

.74                .42 

.10 

.64 

.59 

Sample  size  of  the  linear  model  is  100.   All  entries  are  reported  relative  to  the  "optimal 
L-estimator,"  e.g.,  mse  (7LofJ/mse  (Marcos)- 
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the  Laplace  (double-exponential)  distribution,  but  again  only  rather  slightly.  In  the  uniform 
case,  the  optimal  M-estimator  is  the  /^-estimator  and  it  substantially  outperforms  the  optimal 
L-estimator.  Here  the  rates  of  convergence  are  non-standard,  so  perhaps  this  disparity  is  not 
so  surprising.  More  surprising  is  the  poor  performance  of  the  "optimal"  L-estimator  in  the 
bimodal  mixture  of  normals.  Here  the  optimal  weight  function  looks  like  the  "untrimmed 
regression  quantile  mean"  except  that  the  central  quantiles  are  drastically  downweighted. 
Clearly,  the  estimated  7's  deliver  superior  performance  in  this  case,  but  the  explanation  is 
somewhat  mysterious. 

As  the  theory  predicts,  the  adaptive  L-estimators  offer  good  performance  over  the  entire 
range  of  distributions  investigated.  To  our  delight,  they  are  particularly  successful  in  the 
asymmetric  and  bimodal  cases.  But  they  offer  high  efficiency  in  the  more  familiar  symmetric 
unimodal  cases  as  well.  Finally,  we  must  emphasize  that  these  results  are  based  on  a  relatively 
small  number  of  replications  and  very  little  experimentation  with  the  smoothing  methods 
employed  to  estimate  the  J  functions.  In  future  work  we  hope  to  report  more  extensive  exper- 
imental results. 
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Appendix  A 
A  Uniform  Bahadur  Representation  for  Regression  Quantiles  with  Explicit  Bounds. 


Basically,  the  proof  of  Theorem  2.1  of  Koenker  and  Portnoy  (1986)  will  be  followed 
exactly  with  bounds  expressed  explicitly  as  functions  of  the  distribution  and  interval 
(a,  1  -  a).   However,  this  requires  the  result  of  Lemma  2.1  of  Portnoy  (1984)  showing  that  ||7|| 

0„( — S-^-)1/2.  To  obtain  explicit  bounds,  condition  (2.10)  of  Portnoy  (1984)  must  be  replaced 
v      n 

by  condition  X4  as  described  in  Proposition  3.2  of  Portnoy  (1984)  (with  some  modification  of 
the  argument).  The  conditions  required  here  are  the  following. 

Al:    Conditions  XI -X4  hold  and  the  density  /  is  continuous,  bounded,  and  strictly  positive. 

A2:    In  addition  to  Al,  the  derivative  /  '  exists  and  is  uniformly  bounded. 

Note  that  for  F  e  F,  Ft  satisfies  A2;  and,  hence,  Theorem  A.l  below  holds  for  Fn  and  7 
defined  by  (1.6)  and  (1.3)  (based  on  observations  Y{  in  (2.1))  and  for  F,  given  by  (2.3)  for  any 
F  e  F.  Following  the  proofs  of  Lemma  2.1  and  Proposition  2.2  (Portnoy,  1984)  and  keeping 
careful  track  of  explicit  bounds  yields  the  folowing  results: 

Lemma  A.l.  Assume  condition  Al.  Then  there  exists  n0  and  constants  b^X)  depending 
only  on  the  constants  in  conditions  XI -X4  such  that  for  n  >  n0. 

||7||  <*(*,/)*  (log  n/n)1'2  (A.l) 

where 


P{\R  I  >  w}  <  e-*ltxx"  -1>2losn  (for  w  >2) 

K(X,/)  =  b2(X)/{in^b3{x)f(t)). 

Here,  we  define 

infa,6/(/)  =  inf{/(0:  F~l(a)  -  b  <t  <  F~l(l  -  a)  +  b). 


(A.2) 
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The  results  of  Koenker  and  Portnoy  (1986)  can  also  be  extended  by  providing  firm 
bounds  in  terms  of  the  density,  /,  and  the  constants  in  X1-X4.  Again  with  bi(X)  denoting 
constants  (depending  only  on  XI -X4),  careful  consideration  of  the  proofs  in  Koenker  and 
Portnoy  (1986)  yields  the  following  results: 

Lemma  A.2.  Assume  condition  A2  and  define  for  5  e  Rp  and  0  <  8  <  1, 

T(S,  9)  =  fj  x,  {/(«<  <  F~\8)  +  Xi'S)  -  Iim  <  F"1^))} 

.=i  (A.3) 

T(6,  6)  =  T(S,  6)  -  ET(6,  6). 
T)\en,  for6eA=  {6:  \\S\\  <  K(\og  n  /n  f'2)  and  a  <  6  <  1  -  a, 

\ET(6,9)-nQ6f(F-\d))\  <  K(K  +  l^^Ksup^U)  +  supz  \f  \x)\)  ■  n"\\og  nfl2 
and 


P\       sup       \\T (6,  6)\\  >  (/i »/4  log  n)K*b2(X){supxf(x)  +  sup,  \f  '{x)\) 

<  Kexp  {b3(X)  supxf(x)-  (log  «)}  +  -±=r  {2sup,  f  (x)  /infj  (t)  +  b4(X)). 

Combining  Lemmas  A.l  and  A.2  yields, 
Lemma  A.3.  Under  A2, 


P\    sup     \\T((0,V,6)\\>n^(logn)B(X,F)  \<q(X,F) 

a<0<l-a 


where 


D(v   r.      b1(X){supxf(x)  +  supx\f'(x)\) 
B(X,  F)  =  

0'«/«,62(X)  /(')}* 


q{X,  F)  =  -^=-  exp  {bA(X)  sup  f  {X))/[;m^hb(x)f  (t)). 


Lastly,  as  a  consequence  we  have 
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Tlieorem  A.I.  Under  condition  A2,  with  B(X,  F)  and  q(X,  F)  defined  above, 
sup     \Fn(F-\9))-  J-£  Ku,  <F-\6))\  <n^{logn)B(X,F) 

a<V<l~a  n    i=1 

and 

sup    HetfO?)  "  l)f(F-\B))  -  -  2  *,•(*  -  /(«,-  <  f-1^)))!! 

a<<Xl-a  «     ,-  =  1 

<  n^*  (log  n)B(X,  F)  +  bl(X)/n 
except  on  a  set  with  probability  bounded  above  by  q(X,  F). 
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